Counting Co-occurrences in Citations to Identify Plagiarised Text Fragments

نویسندگان

  • Solange de L. Pertile
  • Paolo Rosso
  • Viviane Pereira Moreira
چکیده

Research in external plagiarism detection is mainly concerned with the comparison of the textual contents of a suspicious document against the contents of a collection of original documents. More recently, methods that try to detect plagiarism based on citation patterns have been proposed. These methods are particularly useful for detecting plagiarism in scientific publications. In this work, we assess the value of identifying co-occurrences in citations by checking whether this method can identify cases of plagiarism in a dataset of scientific papers. Our results show that most the cases in which co-occurrences were found indeed correspond to plagiarised passages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Automatic Plagiarism Detection Based on n-Grams Comparison

When automatic plagiarism detection is carried out considering a reference corpus, a suspicious text is compared to a set of original documents in order to relate the plagiarised text fragments to their potential source. One of the biggest difficulties in this task is to locate plagiarised fragments that have been modified (by rewording, insertion or deletion, for example) from the source text....

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Hierarchical Topic Structuring: From Dense Segmentation to Topically Focused Fragments via Burst Analysis

Topic segmentation traditionally relies on lexical cohesion measured through word re-occurrences to output a dense segmentation, either linear or hierarchical. In this paper, a novel organization of the topical structure of textual content is proposed. Rather than searching for topic shifts to yield dense segmentation, we propose an algorithm to extract topically focused fragments organized in ...

متن کامل

Distilling Conceptual Connections from MeSH Co-Occurrences

Our aim is to contribute to biomedical text extraction and mining research. In this paper we present exploratory research on the MeSH terms assigned to MEDLINE citations. We analyze MeSH based co-occurrences and identify the interesting ones, i.e., those that are likely to be semantically meaningful. For each selected co-occurring pair we derive a weighted vector representation that emphasizes ...

متن کامل

The complexity of counting poset and permutation patterns

We introduce a notion of pattern occurrence that generalizes both classical permutation patterns as well as poset containment. Many questions about pattern statistics and avoidance generalize naturally to this setting, and we focus on functional complexity problems – particularly those that arise by constraining the order dimensions of the pattern and text posets. We show that counting the numb...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013